jensen inequality
- Europe > Italy > Friuli Venezia Giulia > Trieste Province > Trieste (0.04)
- Europe > France > Île-de-France > Paris > Paris (0.04)
- Europe > France > Hauts-de-France > Nord > Lille (0.04)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.66)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- North America > United States > Kansas > Rawlins County (0.04)
- Europe > Italy > Friuli Venezia Giulia > Trieste Province > Trieste (0.04)
- Europe > France > Île-de-France > Paris > Paris (0.04)
- Europe > France > Hauts-de-France > Nord > Lille (0.04)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.66)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
Graphon Particle Systems, Part II: Dynamics of Distributed Stochastic Continuum Optimization
We study the distributed optimization problem over a graphon with a continuum of nodes, which is regarded as the limit of the distributed networked optimization as the number of nodes goes to infinity. Each node has a private local cost function. The global cost function, which all nodes cooperatively minimize, is the integral of the local cost functions on the node set. We propose stochastic gradient descent and gradient tracking algorithms over the graphon. We establish a general lemma for the upper bound estimation related to a class of time-varying differential inequalities with negative linear terms, based upon which, we prove that for both kinds of algorithms, the second moments of the nodes' states are uniformly bounded. Especially, for the stochastic gradient tracking algorithm, we transform the convergence analysis into the asymptotic property of coupled nonlinear differential inequalities with time-varying coefficients and develop a decoupling method. For both kinds of algorithms, we show that by choosing the time-varying algorithm gains properly, all nodes' states achieve $\mathcal{L}^{\infty}$-consensus for a connected graphon. Furthermore, if the local cost functions are strongly convex, then all nodes' states converge to the minimizer of the global cost function and the auxiliary states in the stochastic gradient tracking algorithm converge to the gradient value of the global cost function at the minimizer uniformly in mean square.
- Asia > China > Shanghai > Shanghai (0.04)
- North America > United States > New York (0.04)
- Asia > Singapore (0.04)
- (4 more...)
Loss function based second-order Jensen inequality and its application to particle variational inference
Futami, Futoshi, Iwata, Tomoharu, Ueda, Naonori, Sato, Issei, Sugiyama, Masashi
Bayesian model averaging, obtained as the expectation of a likelihood function by a posterior distribution, has been widely used for prediction, evaluation of uncertainty, and model selection. Various approaches have been developed to efficiently capture the information in the posterior distribution; one such approach is the optimization of a set of models simultaneously with interaction to ensure the diversity of the individual models in the same way as ensemble learning. A representative approach is particle variational inference (PVI), which uses an ensemble of models as an empirical approximation for the posterior distribution. PVI iteratively updates each model with a repulsion force to ensure the diversity of the optimized models. However, despite its promising performance, a theoretical understanding of this repulsion and its association with the generalization ability remains unclear. In this paper, we tackle this problem in light of PAC-Bayesian analysis. First, we provide a new second-order Jensen inequality, which has the repulsion term based on the loss function. Thanks to the repulsion term, it is tighter than the standard Jensen inequality. Then, we derive a novel generalization error bound and show that it can be reduced by enhancing the diversity of models. Finally, we derive a new PVI that optimizes the generalization error bound directly. Numerical experiments demonstrate that the performance of the proposed PVI compares favorably with existing methods in the experiment.
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.66)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
Better Approximate Inference for Partial Likelihood Models with a Latent Structure
Setlur, Amrith, Póczós, Barnabás
Temporal Point Processes (TPP) with partial likelihoods involving a latent structure often entail an intractable marginalization, thus making inference hard. We propose a novel approach to Maximum Likelihood Estimation (MLE) involving approximate inference over the latent variables by minimizing a tight upper bound on the approximation gap. Given a discrete latent variable $Z$, the proposed approximation reduces inference complexity from $O(|Z|^c)$ to $O(|Z|)$. We use convex conjugates to determine this upper bound in a closed form and show that its addition to the optimization objective results in improved results for models assuming proportional hazards as in Survival Analysis.
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Europe > Romania > Nord-Vest Development Region > Bihor County > Oradea (0.04)
- Europe > Netherlands > South Holland > Rotterdam (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.55)